Ch2: Small worlds

Jesse Brunner

What do we mean by small worlds?

Simply this:

  • we are working with models
  • models are simplification of reality
    • they are incomplete
    • they are myopic

Probability as counting

  • For each possible explanation of the data (i.e., possible value of \(p\))
  • Count all of the ways the data could happen (i.e., plausibility of data | \(p_{proposed}\))
  • Explanations with more ways to produce them are more plausible

Probability as counting

Most formulas for probabilities are just short-cuts for counting (or integrating). E.g., the binomial distribution:

\[\begin{align} \Pr(k \text{ of } n | p) &= \binom{n}{k}p^k(1-p)^{n-k} \\ &= \frac{n!}{k!(n-k)!} p^k(1-p)^{n-k} \end{align}\]

Probability as counting

Most formulas for probabilities are just short-cuts for counting (or integrating). E.g., the binomial distribution:

\[\begin{align} \Pr(k \text{ of } n | p) &= \binom{n}{k}p^k(1-p)^{n-k} \\ &= \frac{n!}{k!(n-k)!} p^k(1-p)^{n-k} \end{align}\]

Note: This formula provides us a short-cut for calculating the probability of observing \(k\) of \(n\) “successes” given a particular value of the parameter \(p\).

Don’t fret! Just think of constraints!

We choose statistical distributions because of

  1. Theory (someone else figured it out for you!)

  2. Constraints (stuff you know)

    • Discrete vs. continuous?
    • Bounded? Positive?
    • A/symmetric?

We will learn along the way

Bayes theorem

\[\begin{align} \Pr(p_i | k) &= \frac{\text{Probability of data} | p_i\times \text{Prior probability of }p_i}{\text{Probability of the data overall}} \end{align}\]

But let’s not get caught up in the math! (Watch the 3B1B video… it’s better!)

Bayes theorem

A Bayesian modeling perspective

  • Generative models are Bayesian models
    • start from there… takes longer, but more powerful
  • Workflow is important
  • Variables: data are observed variables, parameters are unobserved variables
  • Distributions: likelihoods and priors are both distributions
  • Indexing: there is no random or fixed effects, just indexed variables

Parts of the model

  • Describe the distribution of the data | parameters (AKA likelihood)
    • \(W \sim \text{binomial}(n, p)\) in globe-flipping example
      • \(W\) is number times thumb was on water
      • \(n\) is number of flips
      • \(p\) is the parameter we are trying to estimate
    • can also involve description of how parameters relate
      • E.g., \(\text{logit}(p_i) = \alpha + \beta \times x_i\)
  • Description of the distribution of parameters prior to observing the data
    • \(p \sim \text{beta}(1, 2)\)
    • What is possible (and more/less probable) prior to observing data?
  • Some engine to do the calculations
    • integration (nope) & conjugate priors (sometimes you get lucky!)
    • grid approximation
    • quadratic approximation (quap())
    • MCMC, WinBugs, JAGS, Stan (ulam())